Identifying Similar Words and Contexts in Natural Language with SenseClusters

نویسندگان

  • Ted Pedersen
  • Anagha Kulkarni
چکیده

SenseClusters is a freely available intelligent system that clusters together similar contexts in natural language text. Thereafter it assigns identifying labels to these clusters based on their content. It is a purely unsupervised approach that is language independent, and uses no knowledge other than what is available in raw un-annotated corpora. In addition to clustering similar contexts, it can be used to identify synonyms and sets of related words. It has been applied to a diverse range of problems, including proper name disambiguation, word sense discrimination, email organization, and document clustering. SenseClusters is a complete system that supports feature selection from large corpora, several different context representation schemes, various clustering algorithms, the creation of descriptive and discriminating labels for the discovered clusters, and evaluation relative to gold standard data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts

SenseClusters is a freely available system that identifies similar contexts in text. It relies on lexical features to build first and second order representations of contexts, which are then clustered using unsupervised methods. It was originally developed to discriminate among contexts centered around a given target word, but can now be applied more generally. It also supports methods that cre...

متن کامل

Optimizing question answering systems by Accelerated Particle Swarm Optimization (APSO)

One of the most important research areas in natural language processing is Question Answering Systems (QASs). Existing search engines, with Google at the top, have many remarkable capabilities. But there is a basic limitation (search engines do not have deduction capability), a capability which a QAS is expected to have. In this perspective, a search engine may be viewed as a semi-mechanized QA...

متن کامل

A Supervised Method of Feature Weighting for Measuring Semantic Relatedness

The clustering of related words is crucial for a variety of Natural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have similar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the m...

متن کامل

An Unsupervised Vector Approach to Biomedical Term Disambiguation: Integrating UMLS and Medline

This paper introduces an unsupervised vector approach to disambiguate words in biomedical text that can be applied to all-word disambiguation. We explore using contextual information from the Unified Medical Language System (UMLS) to describe the possible senses of a word. We experiment with automatically creating individualized stoplists to help reduce the noise in our dataset. We compare our ...

متن کامل

Word meaning in context: a probabilistic model and its application to question answering

The need for assessing similarity in meaning is central to most language technology applications. Distributional methods are robust, unsupervised methods which achieve high performance on this task. These methods measure similarity of word types solely based on patterns of word occurrences in large corpora, following the intuition that similar words occur in similar contexts. As most Natural La...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005